rasterize: hoist poly_props/poly_global cupy.asarray above all_touched (#2506)#2510
Conversation
xarray-contrib#2506) Both `_run_cupy` and `_rasterize_tile_cupy` previously called `cupy.asarray(poly_props)` and `cupy.asarray(poly_global)` twice when `all_touched=True` -- once for the scanline `poly_launch` tuple and once for the supercover `boundary_launch` tuple. The two launches operate on the same tile, so the second upload re-transferred identical bytes for every dask tile. Stage the device buffers above the conditional so both launches share them. For 10k polygons / 8 cols the per-tile transfer cost drops from 0.218 ms to 0.092 ms (2.4x) and 720 KB of redundant PCIe traffic per tile is eliminated; for a 100-tile dask+cupy raster that is ~13 ms and 72 MB saved end to end. Adds 12 regression tests in test_rasterize_props_hoist_2506.py: - 4 AST-level assertions that each function calls `cupy.asarray(poly_props/poly_global)` exactly once. - 5 cupy vs numpy all_touched parity tests covering last/first/max/min/sum merges. - 3 dask+cupy smoke tests that exercise the hoisted upload through every per-tile launch. The dask+cupy + all_touched pixel-level parity gap (boundary segments crossing tile borders behave differently than the eager numpy path) predates this fix and is not addressed here.
Tighten the props/global hoist guard added in xarray-contrib#2506 so the host-to-device transfer is skipped when neither the scanline nor the supercover boundary launch will consume it (no in-tile edges and not all_touched). Without this guard the hoist could upload poly_props/poly_global for a tile whose polygons fall entirely outside the raster bounds even when all_touched is False, which the pre-hoist code never did. In _rasterize_tile_cupy the upload also has to move below _extract_edges so the guard can read len(edge_y_min).
… helper The _to_numpy helper in the xarray-contrib#2506 regression test carried a comment referencing a sweep skill's authoring rule about .compute(). That note belongs in the agent's prompt, not committed test code.
|
Self-review pass via /review-pr. 0 blockers, 1 suggestion, 1 nit, both addressed on this branch. S1: gate the hoist on actual usage As originally landed, the hoist uploads N1: strip agent prompt residue from the test The Tests: 464 passed, 2 skipped (full |
Summary
_run_cupyand_rasterize_tile_cupypreviously transferredpoly_propsandpoly_globalto the GPU twice whenall_touched=True(once for the scanline launch, once for the supercover boundary launch). Hoist thecupy.asarray()calls above the scanline / boundary conditional so both launches share the same device buffer.Test plan
test_rasterize_props_hoist_2506.pyconfirm each function callscupy.asarray(poly_props/poly_global)exactly once.last/first/max/min/summerges underall_touched=True.Notes
The dask+cupy +
all_touchedpixel-level parity gap (boundary segments crossing tile borders behave differently than the eager numpy path) predates this fix and is not addressed here. The smoke tests assert the hoisted launches still produce a populated raster rather than asserting full numpy parity.Discovered via
/deep-sweepperformance pass onrasterize(2026-05-27).